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Image compression for both still and moving images is an extremely important area of investigation, with 
numerous applications to videoconferencing, interactive education, home entertainment, and potential ap- 
plications to earth observation, medical imaging, digital libraries, and many other areas. 

In this paper we describe our work on a neural network methodology to compress/decompress still and 
moving images. We use the “point-process” type neural network model we have developed [12, 13, 16] 
which is closer to biophysical reality than standard models, and yet is mathematically much more tractable. 
We currently achieve compression ratios of the order of 120 : 1 for moving grey-level images, based on a 
combination of motion detection and compression. The observed Signal- to-Noise- Ratio varies from values 
' above 25 to more than 35. Our method is computationally fast so that compression and decompression can 
be carried out in real-time. It uses the adaptive capabilities of a set of neural networks so as to select varying 
compression ratios in real-time as a function of quality achieved. It also uses a motion detector which will 
avoid retransmitting portions of the image which have varied little from the previous frame. 

Further improvements can be achieved by using on-line learning during compression, and by appropriate 
t compensation of non-linearities in the compression/decompression scheme. We expect to go well beyond the 
250 : 1 compression level for color images with good quality levels. 
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1 Introduction 


As the volume of imaging data increases exponentially in a very wide variety of applications - including 
remote sensing, earth observation, medical imaging, digital libraries and documents, HDTV, entertainment 
and him, and videoconferencing - and as the needs for storing, retrieving and transmitting images expand, 
digital image compression is becoming an even more crucial technology. Many of these application areas 
- including earth observation, videoconferencing and many military applications - deal with sequences of 
images which represent some form of motion. For instance, sequences of pictures taken by a satellite each 
time it passes over nearly the same stretch of territory, after appropriate repositioning and compensation, 
are successive instances of the same scene containing changes due to the motion of objects (vehicles, for 
instance), or due to changing meteorological conditions. Thus compression can take great advantage of the 
fact that image sequences need only keep track of changes which occur from one frame to the next. 

In some areas (such as medical imaging) it is more customary to deal with grey-level images. In other areas of 
application, one deals overwhelmingly with colour images (as in entertainment). The quality of a processed 
or compressed image is judged quite differently, whether one deals with grey-level or with colour. In the 
case of color, acceptable image quality will largely depend on the application. For instance, in HDTV one 
would be unhappy with a change in skin pigmentation (a greenish face does not look too good ...), while the 
change in a dress’ colour may not matter too much. 

Lossless compression is adequate when low compression ratios are acceptable. Very substantial compression 
ratios can only be achieved with lossy compression schemes. Many applications will accept lossy compression, 
as long as the resulting quality is good. In some critical applications - such as medical imaging and military 
observation - loss may not be tolerated. However even in those applications, compressed versions of archival 
images may be conveniently used for remote interrogation and fast access. The aim is of image compression 
is to encode images or image sequences into as few bits as possible with a decoding mechanism which 
reconstructs the original image with an acceptable visual and/or informational quality. Another issue in 
image compression and decompression is its speed, especially in real-time applications, or in those in which 
the rate at which the source produces data is very high. It is therefore often important to be able to carry 
out compression and decompression “on the fly” without additional delay in conveying the image. 

In this paper we will describe a method for compressing and decompressing still and moving images. For 
moving image sequences of grey-level images, we obtain better than 110 : 1 compression levels with 20 to 30 
Signal to Noise Ratio (SNR). We use a learning algorithm for the “random neural network” model (Gelenbe 
1989, 1990, 1993 [12, 13, 16] 1 ) to “teach” a set of networks to compress at different compression levels. A 
schematic representation of the complete method we propose is shown in Figure 1. The method uses a 
simple motion detection scheme, together with the set of learning neural networks for compression and 
decompression. 

In the sequel we first describe the problem, then review the literature, after which we describe our method 
together with measurements describing the resulting compression levels, the SNR of reconstructed images. 
We also provide an indication of the data transmission rates for the schemes we develop. This last metric 
is particularly relevant when images are transferred over networks, since the nature of the traffic determines 
the performance levels which can be expected and the appropriate traffic controls which may have to be 
imposed. 

1 This model has also been successfully applied to other applications including optimization [l 5] and image texture analysis 
and reconstruction [3,4]. 
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Figure 1: Block diagram of the complete compression scheme. 

1.1 The Image Compression Problem 



Q 


A ima^e 7 is described by a function / : Z x Z — * {0, 1, . . . , 2* — 1} where Z is the set of natural 

numbers, and k is the maximum number of bits to be used to represent the gray level of each pixel. In other 
words, / is a mapping from discrete spatial coordinates (x, y) to gray level values. Thus, M x N x k bits 
are required to store an M x N digital image. The aim of digital image compression is to develop a scheme 
to encode the original image I into the fewest number of bits such that the image V reconstructed from this 
reduced representation through the decoding process is as similar to the original image as possible: i.e. the 
problem is to design a COMPRESS and a DECOMPRESS block so that I ~ V and \I C \ « |/| where |.| denotes 
the size in bits (Figure 2). 



Figure 2: Image Compression Block Diagram 

The similarity measure can vary for each application. Some applications may require the reconstructed image 
to be exactly the same as the original image, in which case the process is called lossless compression. In lossy 
compression , the peak signal- to-noise ratio or SNR is used as the measure of similarity or of dissimilarity, 
although it does not necessarily reflect visual quality. Assuming that the original and reconstructed images 
are represented by functions f(x,y) and g(x,y) of the pixel plane position (x,y), respectively, the SNR is 
defined by: 

SNR=101o glo (2t - ~ 1)2 (1) 

e rms 

where the root- means- square error is 

Af — 1 N- 1 

e ™ s = e2 = jjjj Y Y t - f( x ' y) ] 2 ( 2 ) 

x=0 y=0 


When moving images are concerned, the compression ratio may vary dynamically with the specific image 
£2 or image portion being transmitted, since some advantage will be taken of the existence or non-existence 

of significant motion in successive image frames. However the SNR metric will still be relevant to the 
evaluation of the resulting quality. 
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1.2 State-of-the-Art in Still and Moving Image compression 


Image compression research generally addresses the basic trade-off between the reconstruction quality of 
the compressed image, the compression ratio, and the complexity and speed of the compression algorithm. 
The two currently accepted standards for still and moving image compression are JPEG ([34]) and MPEG 
([25]). These schemes provide high compression ratios with good picture reconstruction qualities. How- 
ever, the amount of computation required for both is generally too high for real-time applications. MPEG 
uses the following techniques: 1) RGB color space coding to YCrCb coding, this gives and automatic 2:1 
compression ratio, 2) JPEG encoding based on discrete cosine transform and quantization followed by some 
lossless compression, which yields compression ratios as high as 30:1 with good image quality, and 3) Motion 
Compensation, in which a frame can be encoded in terms of the previous and next frames. However, these 
techniques severely limit the speed at which a sequence of images can be compressed. 

Two classical techniques for still image compression are transform and sub-band encoding. In transform 
coding techniques the image is subdivided into small block each of which undergoes some reversible linear 
transformation (Fourier, Hadamard, Karhunen-Loeve, etc.) followed by quantization and coding based on 
reducing redundant information in the transformed domain. In subband coding ([35]), an image is filtered to 
create a set of images, each of which contains a limited range of spatial frequencies. These so-called subbands 
are then downsampled, quantized and coded. These techniques require much computation. Another common 
image compression method is vector quantization ([18]) which can achieve high compression ratios. A vector 
quantizer is a system for mapping a stream of analog or very high rate or volume discrete data into a 
sequence of low volume and rate data suitable for storage in mass memory, and communication over a digital 
channel. This technique mainly suffers from edge degradation and high computational complexity. Although 
some more sophisticated vector quantization schemes have been proposed to reduce edge effects ([30]), the 
computation overhead still exists. Recently, novel approaches have been introduced based on pyramidal 
structures [1], wavelet transforms [36], and fractal transforms [20]. These and some other new techniques 
[24] inspired by the representation of visual information in the brain, can achieve high compression ratios 
with good visual quality but are nevertheless computationally intensive. 

The speed of compression/decompression is a major issue in applications such as videoconferencing, HDTV 
applications, videophones, which are all likely to be a part of daily life in the near future. Artificial neural 
networks [31] are being widely used as alternative computational tools in many applications. This popularity 
is mainly due to the inherently parallel structure of these networks and to their learning capabilities which 
can be effectively used for image compression. 

Several researchers have used the Learning Vector Quantization (LVQ) network [23] for developing codebooks 
whose distribution of codewords approximates the probabilistic distribution of data which is to be presented. 
A Hopfield network for vector quantization which achieves compression of less than 4:1 is reported in [27]. A 
Kohonen net method for codebook compression is demonstrated in [29]; it seems to perform slightly better 
than another standard method of generating codebooks. Cottrell et al. ([8]) train a two-layer perceptron 
with a small of number of hidden units to encode and decode images, but do not report encouraging results 
about the performance of the network on previously unseen images. Using neural encoder/decoders has 
been suggested by many researchers such as [6]. In [10], the authors present a neural network method for 
finding coefficients of a 2-D Gabor transform. This 2- way function can then be quantized and encoded to 
give good images at compression of under 1 bit/pixel, and as low as 0.38 bits/pixel with good image quality 
in a particular case. 

A feed- forward neural network model to achieve 16 : 1 compression of untrained images with SNR = 26.9 dB 
is presented in [26] by using four different networks to encode different “types” of images. A backpropagation 
network to compress data at the hidden layer and an implementation on a 512 processor NCUBE are 
discussed in [32]. In [19], the authors perform a comparison of backpropagation networks with recirculation 
networks and the DCT (discrete cosine transform). The best results reported here are obtained with the 
DCT, then with recirculation networks and finally with backpropagation networks. An interesting feature 
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of this paper is that they show the basis images for the neural networks, which allows one to compare the 
underlying matrix transformations of the neural networks to that of the DCT. In [11], the authors present 
a VLSI implementation of a neuro vector quantization/codebook algorithm. In [28], the authors use a back- 
propagation based nested training algorithm to do compression. For images on which the network has already 
been trained (which is not specifically of practical use) the compression ratios and resulting qualities are 
as follows: 8:1 (SNR = 22.89dB), 64:1 (SNR=15.15dB) to 256:1 (SNR=10.44dB). For previously “unseen” 
images, results are given with the following ratios and qualities: 8:1 (SNR=18.13dB) to 64:1 (SNR=12.93dB). 
Our own results for “unseen” images provide substantially better quality, especially at the lower compression 
ratios (8:1 and 16:1). In [22], the authors suggest the use of a non-linear mapping function whose parameters 
are learned in order to achieve better image compression in a standard backpropagation network. 

Motion detection and compensation are key issues when one deals with moving images. Motion compensation 
provides for a great deal of the compression in the MPEG standard. By using motion compensation, MPEG 
can code the blocks in a frame in terms of motion vectors for the blocks in the previous and/or next 
frames. To perform motion must be estimated using block matching over the area local to the block under 
consideration. Exhaustive searches which consider all possible motion vectors yield good results. However 
for large ranges, the cost of such a search becomes prohibitive and heuristic searches must be used. This 
also raises the problem that full motion compensation cannot be performed in real time since it requires the 
future frame to be known in advance. Partial motion compensation, in which blocks may be encoded only 
in terms of blocks in the previous frame, may be used. One should also note that the MPEG standard does 
not specify the method of motion compensation to be used and a neural solution to motion compensation 
problem in two dimensions has been examined. In [9], a neural network for motion detection is presented; 
however it only works for a one dimensional case and the authors state that problems arise when the approach 
is extended to two dimensional detection of edge motion. It appears this approach would involve a great 
deal of research before it could be usefully applied in moving picture compression. In [7], a neural network 
method for motion estimation is presented. Drawbacks include the assumption that displacement is uniform 
in the area of interest. This would be a problem in trying to estimate the motion of a human being in which 
motion vectors differ over subsets of the picture. 


2 Still Image Compression with the Random Neural Network 


One of the common neural approaches in image compression is to train a network to encode and decode the 
input data [8], so that the resulting difference between input and output images is minimized. The network 
consists of an input layer and an output layer of equal sizes, with an intermediate layer of smaller size in 
between. The ratio of the size of the input layer to the size of the intermediate layer is - of course - the 
compression ratio. More generally, there can also be several intermediate layers. The network is usually 
trained on one or more images so that it develops an internal representation corresponding not to the image 
itself, but rather to the relevant features of a class of images. 

In our approach, both the input, intermediate and output image is subdivided into equal-sized blocks and 
compression is carried block by block (see Figure 3). This has the desirable effect of reducing the network 
learning time. It also achieves good generalization, since the blocks comprising a single test image are used 
as the training set. The amount of information representing the compression and decompression algorithm 
(i.e. the “weights”) is also substantially reduced in this manner. We use a feedforward encoder/decoder 
random neural network with one intermediate layer as shown in Figure 8. The weights between the input 
layer and the intermediate layer correspond to the encoding or compression process, while the weights from 
the intermediate to the output layer correspond to the decoding or decompression process. 

Our current results use 8x8 boxes, where each element is a byte. We encode the 8-bit gray level values as 
real numbers between 0 and I, i.e. we map the [0, 255] interval into the [0, 1] interval since the grey level of 
each image pixel is transformed into a real-valued excitation level of a neuron (and vice-versa). The network 
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Figure 3: Compression of an arbitrarily large image using a neural encoder/decoder 


is trained so as to minimize the squared error between the output and input values, thus maximizing the 
SNR } with the proviso that the image SNR is measured for quantized values in [0,255] while the neural 
network learning uses the corresponding real- valued network parameters. In all the results we report, both 
in this section and when we deal with moving images, our networks are trained using the algorithm described 
in [16] using a single image: the well-known 512 x 512 8-bit Lena . Indeed, we have found that Lena provides 
some of the best results for training the network. The network is then tested for a variety of images, and we 
have observed a reconstruction quality ranging from SNR = 23 dB to more than 30 dB for 16 : 1 compression 
(i.e. 0.5 bits/pixel). As an example, Figure 4 shows our results with 16 : 1 compression for the 512 x 512 
8-bit Peppers image [17]. 



PEPPERS original SNR = 27.82 

Figure 4: Test results for 16 : 1 compression (0.5 bit /pixel) with random neural network 


2.1 Motion Detection 


In many applications such as videoconferencing , sequences of image frames representing a moving scene are 
transmitted. Often, a substantial part of an image, such as the background, basically does not move - 
except for noise which may originate at various levels, including the imaging devices. On the other hand, 
the objects in the image move relative to the background, but this displacement be quite small between any 
two successive frames. We use these facts in order to perform motion detection. Specifically we examine the 
8x8 boxes from successive frames Fi_i, F{. Motion is sensed if the average grayscale value of a box in P* 
differs from that of the corresponding box in frame F*_i by more than a certain amount d. We have observed 
experimentally that the difference in the average grayscale value of a block that is perceptable to the human 
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eye is around around d = 1. Note that the box structure used throughout our compression scheme makes 
this approach possible as long as the box size is small enough. Indeed, a large box size would either make it 
highly improbable that motion has not occurred within any given box, or would render the detection process 
insensitive if accompanied by a large value of d. 

We use the first 101 frames of gray-level image sequences, Miss America and Salesman , to test our motion 
detector. Each frame is of size 360 x 288 yielding 1620 8 x 8 boxes. To test the motion detector, we load 
the first two frames into two arrays. Array 1 contains the frame which is on the screen at the receiving 
end of the transmission, while Array 2 is the new frame. Each 8x8 box in the frames is tested for motion 
detection. If a box is classified as unchanged, the box in Array 1 is replaced by the box in Array. Once 
all of the boxes are tested, the next frame is loaded into Array 2, and the process is repeated. Clearly, the 
parameter d will influence both the compression ratios and the resulting image quality. In order to illustrate 
its effect on compression we have run a series of tests summarized on Table 1. In the tabulated information 
note that the “Total Compression Ratio” is derived from the size of the whole video sequence after motion 
detection, whereas the “Steady State Compression Ratio” is the average compression ratio due to motion 
detection over all the frames after the complete first frame has been transmitted. Both values do include the 
overhead due to the additional bits sent for each box of each frame: two bytes to indicate x and y indices 
of the block in that frame. For storage applications, a simpler and possibly more efficient scheme with one 
bit per block can be used: a bit value of “1” means that motion is detected in the box and that it be sent, 
while “0” means that the box will not be sent (and therefore that the previous frame’s corresponding box 
should be used). However, considering network applications, we will prefer the former header so that the 
image transmission will not be sensitive to packet losses. 
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Table 1: Compression ratios obtained only by motion detection: as a function of difference threshold d 


Other results are presented in the form of the actual images before and after motion detection. Figure 5 
shows the original and the reconstructed 101st -and last- frame of the sequence with d = 1. In Figure 6(a), 
the SNR is plotted as a function of frame number for d = 1. Similarly Figure 6(b) shows the number of 
bits transmitted as a function of frame number. ^From these results and other experiments we have run, 
it appears that a compression ratio of 6 or 7 can be obtained easily with a value of d close to or slightly 
above 1, with satisfactory image quality, when only motion detection is used for compression. In the next 
section this scheme will be combined with the actual neural compression of frames in order to achieve high 
compression ratios and satisfactory image quality. 
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Original 101st frame Reconstructed (SNR = 38.21) 


Figure 5: Original and reconstructed last frames (101st frames) in the Salesman sequence using the motion 
detection scheme with d — 1 




(*) (b) 

Figure 6: Experimental results for motion detection with d = 1: a) PSNR as a function of frame number, b) 
Number of bits transmitted as a function of frame number 


3 Compression for Moving Images 


In this section we will describe and evaluate the complete compression scheme for video sequences of natural 
images, using a combination of the motion detection scheme described earlier together with our adaptive still 
block-by-block (Figure 3) random neural network compression/decompression. Specifically, our compression 
scheme uses three networks: 


• The first network scans successive boxes (fixed size portions of the image) in sequence, and identifies 
those boxes where motion has taken place, as described above. If a box is considered to be identical 
to the same box in the previous frame, it is not compressed or transmitted. 

• The second network carries out compression of the box which is identified by the first network. In fact 
the second network is a set of distinct neural compression networks C\ } ... , Cl which are designed to 
achieve different compression levels. Each of these networks compresses the box in parallel. The choice 
of the compression level to be selected is carried out by the third network. 

• The third network simulates the decompression, and provides a measure of the “quality” of the 
compression-decompression. In fact it is composed of L distinct decompression networks Di, ... , Dl , 
where D{ matches C{. 
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Then the pair Q ,Di which yields the highest compression ratio at a quality level of Q or better, chosen to be 
acceptable for the particular application, is selected and the compressed box is transmitted. For grey-level 
images Q is formulated as a SNR value. Figure 7 shows the block diagram of the adaptive still image 
compression network. Note that with the exception of the initial learning phase, all the operations which 
have been outlined above can be carried out “on-the-fly” , i.e. in real-time as each box goes through the 
transmitter, and as each compressed box goes through the receiver. (See Figure 1 for a block diagram of the 
total proposed scheme). 

Another refinement would be to use the network Di (which is stored both at the transmitting end and at 
the receiving end) to further train the network Ci in on-line mode. In this case, TVs weights will not be 
changed, and only C^s weights are updated. 



Figure 7: Block diagram of the adaptive still image compression network 


At the u : receiving or decompression * end, if the transmitter has sent a 0 bit to indicate that the current box is 
identical to the same box in the previous frame, then the previous frame’s box is placed in the corresponding 
position of the output image. Otherwise the compressed box is received. Implicitly (through the box’s size) 
or explicitly (via some variable i which would accompany the box) the compression level used is known to the 
receiver. We then use the network D{ to decompress the box, which is subsequently placed in appropriate 
sequence into the output image. The relationship between any two compression/decompression networks 
Ci, Di is shown in (Figure 8). 


H NEUROHS K NEURONS 

U«N 


N NEURONS 



Figure 8: A Neural Network Compression/Decompression Pair 
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3.1 Experimental Results for Moving Image Compression 

We have experimented the combined scheme with three still image compression machines (L = 3 with 8:1, 
16 : 1 and 32 : 1 compression /decompression pairs), and have tested it on the 101-frame Miss America and 
Salesman grey-level image sequences. Table 2 summarizes the results we have obtained for Q = 30. 
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Table 2: Compression ratios obtained by the combination of motion detection and still image compression 
with Q = 30: as a function of difference threshold d 


In Figure 9 we show the original and the reconstructed 101st frame of Miss America using the complete 
scheme described above with d = 1.5 and Q — 30. Figure 10 indicates the variation of compression ratio 
over time. Figure 11 shows the running average compression ratios and the running average bits per pixel 
for a runlength of 1000, based on Miss America sequence with d = 2 and Q = 30. In Figure 12. a, PSNR is 
plotted as a function of frame number for d = 2, Q = 30. Figure 12.b shows the number of bits transmitted 
as a function of frame number. 



Figure 9: Original and reconstructed last frames (101st frames) in the Miss AMERICA sequence using the 
motion detection scheme with d = 1.5 combined with still image compression with Q = 30 


4 Discussion and Conclusions 


Many further improvements of the basic method we propose can be thought of and some are certainly 
worth further work. In particular the following observations can be used to design networks with enhanced 
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Total average compression ratio 



Figure 10: Total average compression ratio as a function of block number for the combined scheme with 
d = 2 and Q — 30 



Figure 11: Experimental results with Miss America sequence using the combined scheme with d = 2 and 
Q = 30: a) Running average compression ratio as a function of block number, b) Running average bits 

per pixel as a function of block number 







Figure 12: Experimental results for the combined scheme with d = 2 and Q = 30: a) PSNR as a function 

of frame number, b) Number of bits transmitted as a function of frame number 


compression capabilities: 


• The random neural network learning algorithm (described in the Appendix) applies to arbitrary re- 
current networks. Hence, instead of restricting ourselves to fully feedforward networks, we can use 
feedback connections between the compressed and input layer, and the output layer and the com- 
pressed layer. Further feedback is possible and useful locally within the output layer. Such feedback 
can help the network find better compression/decompression parameters. 

• The quality level (e.g. SNR) predicted at the transmitting end is exactly what the result is for that 
box, after it is decompressed at the receiver, since the networks Z?i, ... ,Di are identical both at the 
transmitter and receiver. Thus we propose to update the weights of the neural networks C\, ... , Cl 
constantly using gradient descent to improve performance with each individual box. This will be 
detrimental to the “real-time” nature of the whole approach we propose, but would be worth examining 
in order to obtain much higher SNR figures. 

• It is also possible to store all of the compression networks Ci, ... , Cl at the receiver - as well as at 
the transmitter. Then, on-going improvement via learning as compression/decompression takes place 
can be carried out periodically for both compression and decompression networks, at the expense of 
transmitting some uncompressed frames or boxes from time to time. 

• Initial learning of weights can be carried out at the transmitter, or receiver, or both at the transmitter 
and receiver, or off-line. The resulting weights would then be loaded into the transmitter and the 
receiver. Note that if the sample images used for learning are known both to the transmitter and to 
the receiver, then the quasi-identical set of weights (to the exception of possible different numerical 
round-errors) can be obtained both at the transmitter and at the receiver. Thus, the images to be 
used as a basis for learning can be transmitted from time to time (i.e. infrequently) from one to the 
other in order to improve the system’s compression capabilities. 

• All the work described in this paper needs to be extended to colour images. Currently, learning of 
the weights of each Ci , Di pair is obtained using gradient descent and the SNR ratio is used as a 
performance criterion is essentially equivalent to a quadratic cost function. We would use other cost 
metrics (such as LAS-type measures) to carry out learning for colour images. 


In addition to the general scheme described above, we will examine some other enhancements related to the 
non-linearity of the input-output amplitude mapping of the compression/decompression scheme. We expect 
to obtain further quality improvement with appropriate compensation of non-linearity. This compensation 
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can also be part of the learning scheme. Moreover, the adaptive selection of the level of compression to 
be used at the transmitter side can be improved by making use of the state of the transmission medium - 
specifically of the network being used. This would be particularly relevant if we are dealing with an ATM 
(Asynchronous Transfer Mode) network. The adaptive decision can be based on feedback about network 
state - such as current load on the network - as well as SNR and/or visual quality metrics. For example, 
in case of little load on the network, we can favor small compression ratios, thus increasing visual quality. 
Similarly, in case of a heavily loaded network, we can sacrifice visual quality and transmit with maximal 
compression. This adaptive decision can also be learned. 

With some of the improvements described above, we expect to achieve compression ratios better than 250 : 1 
for grey-level moving image sequences, and still higher levels for colour, with quality levels of the order of 
SNR = 30 for grey level images, and acceptable LAS-type measures and SNR levels for colour images. 
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5 Appendix: The Random Neural Network Model and its 
Learning Algorithm 


In this appendix we provide a summary of the Random Neural Network Model and of its Learning Algorithm, 
in order to provide a theoretical background for the techniques which are used in this paper. 


5.1 The Random Neural Network Model 


In the random neural network model (Gelenbe (1989,90) [12, 13]) signals in the form of spikes of unit 
amplitude circulate among the neurons. Positive signals represent excitation and negative signals represent 
inhibition. Each neuron’s state is a non-negative integer called its potential, which increases when an 
excitation signal arrives to it, and decreases when an inhibition signal arrives. Thus, an excitatory spike is 
interpreted as a “+1” signal at a receiving neuron, while an inhibitory spike is interpreted as a “—1” signal. 

Neural potential also decreases when the neuron fires. Thus a neuron i emitting a spike, whether it be an 
excitation or an inhibition, will lose potential of one unit, going from some state whose value is k{ to the 
state of value hi — 1 . 

The state of the rc-neuron network at time t , is represented by the vector of non-negative integers k(t) = 
(ki(i ), . . . , & n (0)> where k t (t) is the potential or integer state of neuron i. We will denote by k and ki 
arbitrary values of the state vector and of the t- th neuron’s state. 

Neuron i will “fire” (i.e. become excited and send out spikes) if its potential is positive . The spikes will then 
be sent out at a rate r(j), with independent, identically and exponentially distributed inter-spike intervals. 
Spikes will go out to some neuron j with probability p+(i,j) as excitatory signals, or with probability 
p~(f,y) as inhibitory signals. A neuron may also send signals out of the network with probability <f(i), and 
<*(*) + £"= j lp + (i,j) + P - (i.j)] = 1- Let wfj = r(i) p + (i,j), and w~j = r{i) p~(i,j). Here the “w's” 
play a role similar to that of the synaptic weights in connectionist models, though they specifically represent 
rates of excitatory and inhibitory spike emission. They are non-negative . Exogenous (i.e. those coming from 
the “outside world”) excitatory and inhibitory signals also arrive to neuron i at rates A(t), X(i) } respectively. 

This is a “recurrent network” model, i.e. a network which is allowed to have feedback loops, of arbitrary 
topology. 

Computations related to this model are based on the probability distribution of network state p(k,t) = 
Pr[k(t) = A:], or with the marginal probability that neuron i is excited qi(t) = Pr[ibj(f) >0]. Asa consequence, 
the time-dependent behaviour of the model is described by an infinite system of Chapman-Kolmogorov 
equations for discrete state-space continuous Markovian systems. 

Information in this model is carried by the frequency at which spikes travel. Thus, neuron j, if it is excited, 
will send spikes to neuron i at a frequency Wij = wf- + wfy These spikes will be emitted at exponentially 
distributed random intervals. In turn, each neuron behaves as a non-linear frequency demodulator since it 
transforms the incoming excitatory and inhibitory spike trains’ rates into an “amplitude”, which is g,(f) 
the probability that neuron i is excited at time t. Intuitively speaking, each neuron of this model is also 
a frequency modulator, since neuron i sends out excitatory and inhibitory spikes at rates (or frequencies) 
qi(t)r(i)p + (ij), qi{t)r(i)p~(i,j) to any neuron j. 

The stationary probability distribution associated with the model is the quantity used throughout the corn- 
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putations: 


( 3 ) 


p(k) = limp(fc,t), g : - = lim gi(£), 

f— *oo t— ►oo 

It is given by the following result: 

Theorem 1. Ze* g; denote the quantity 

q , = \+(i)/[r(i) + \-(i)} (4) 

where the A + (f), A“(i) for i = 1 , n satisfy the system of nonlinear simultaneous equations: 

A+ (0 = ^9 i r 0> + 0'.0 + A (*)> A_ (0 = 0 + A (0 (5) 

i 

Lei k(t) be the vector of neuron potentials at time t and k = (fcj, 5e a particu/ar t?a/ue of the vector; 

let p(k) denote the stationary probability distribution, 

p(k) = ^lim Prob[£(t) = k] 

If a nonnegative solution {A + (i), A - (i)} exists to equations 4 <5 such that each qi < 1, then 

p(*) = nu - w 

j=i 


The quantities which are most useful for computational purposes, i.e. the probabilities that each neuron is 
excited, are directly obtained from: 

lim Prob[fcj(f) > 0] = q% = A + (i)/[r(i) -t- A"(f)] if qi < 1. 

t — ► cc 


5.2 The Learning Algorithm 

Let us describe the learning algorithm we use in this study. It is based on the algorithm described in (Gelenbe 
93) [16]. 

The algorithm chooses the set of network parameters W in order to learn a given set of K input-output pairs 
(a, Y) where the set of successive inputs is denoted i = {ti, and t * = (A*, A*) are pairs of positive 

and negative signal flow rates entering each neuron: 

A* = [Afc(l), Ajt(n)], A* = [A*(l), A*(n)] 

The successive desired outputs are the vectors Y = {t/i, where each vector p* = (yu, y n k ), whose 

elements y* fc e [0 , 1] correspond to the desired values of each neuron. The network approximates the set of 
desired output vectors in a manner that minimizes a cost function Z?*: 

1 n 

= oL a *( ?! - ytk a * - 0 

1 is l 

If we wish to remove some neuron j from network output, and hence from the error function, it suffices to 
set cij = 0 



Both of the n by n weight matrices Wj = {w£(ij)} and W* = {iy* (i, ;)} have to be learned after each 
input is presented, by computing for each input i k = (Afc,A fc ), a new value Wj and WjjT of the weight 
matrices, using gradient descent. Clearly, we seek only solutions for which all these weights are positive. 

Let w(u t v) denote any weight term, which would be either ty(u,£>) = w~(u } v), or w(u } v) = ty + (u,t;). The 
weights will be updated as follows: 


fi 

Wk(u,v) = w k ^(u,v) - rt^2ai(q ik - yik)[dqi/dw(u,v)] k 

j— l 


where tj > 0 is some constant, and 


( 7 ) 


1. qik is calculated using the input t k and w( u y v) = in equation 3. 

2. [dqi/dw{u, u)]* is evaluated at the values g t - = g,** and w{u , t;) = v). 


To compute [< 9 g,/c?u>(u, v)]* we turn to the expression 3 , from which we derive the following equation: 

dqi/dw{u , r) = ^ dqj/dw(u, v)[w+(j, i) - w~(j, i)qi]/D(i) 

j 

-l[u = i]qi/D(i) 

+l[ti>(u,t/) = u> + (u,i)]g u /D(i') 

-l[u>(u,t;) = w~(u,i)]q u qi/D(i) 

Let q = (g 1? q n ), and define the n x n matrix 

W = {[w + (i,j)-w-(i,j)qj]/D(j)} i,j — 1 , ..., n 

We can now write the vector equations: 

dq/dw + (u , v) = dq/dw+(u, t>)W + 7 +(u, v)q u 
dq/dw~(u, v) — dq/dw~(u : u)W + 7 ~(u, v)q u 


where the elements of the n-vectors 7+(u, v) = [7+ ( u , v), . . . , 7+(u, v)], 7“(«, ») = [Tf( u > v )> ■ • ■ . 7n ( u . »)] 
are 


Notice that 



r -i im 

if u = t, v ^ i 

7 = 

{ +1 /D(i) 

if u ^ i, v == i 


l o 

for all other values of (u, v) 


f -(1 + gi)/<D(*) if u — i,v = i 

-> 

1 

j? 

C* 

II 

1 -1 /£>(«) 

1 -qi/D(i) 

if u — i,v ^ i 
if it ^ i } v = i 


[ 0 

for all other values of (u, t>) 


dq/dw + (u, v) = 7 +(u, t;)g u [I — W]” 1 

dq/dw~{u, v) — 7 ~(u, 7j)g u [I — W]"* 1 


( 8 ) 


where I denotes the n by n identity matrix. Hence the main computational work is to obtain [I — W] -1 . 
This is of time complexity 0 (n 3 ), or 0 (mn 2 ) if an m-step relaxation method is used. 

We now have the information to specify the complete learning algorithm for the network. We first initialize 
the matrices W^" and W^ in some appropriate manner. This initiation will be made at random. Choose a 
value of 77, and then for each successive value of k , starting with k = 1 proceed as follows: 
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1. Set the input values to ik — (A*, A*). 

2. Solve the system of nonlinear equations 3 with these values. 

3. Solve the system of linear equations (8) with the results of (2). 

4. Using equation 7 and the results of (2) and (3), update the matrices W+ and W* . Since we seek 
the “best” matrices (in terms of gradient descent of the quadratic cost function) that satisfy the 
nonnegativity constraint, in any step k of the algorithm, if the iteration yields a negative value of a 
term, we have two alternatives: 

(a) Set the term to zero, and stop the iteration for this term in this term in this step k ; in the next 
stop HI we will iterate on this term with the same rule starting from its current null value; 

(b) Go back to the previous value of the term and iterate with a smaller value of rj. 


This general scheme can be specialized to feedforward networks yielding a computational complexity of 
0(n 2 ), rather than 0(n 3 ), for each gradient iteration. 


t :: 




M 
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